Method of Selecting Training Data to Build a Compact and Efficient Translation Model

نویسندگان

Keiji Yasuda

Ruiqiang Zhang

Hirofumi Yamamoto

Eiichiro Sumita

چکیده

Target task matched parallel corpora are required for statistical translation model training. However, training corpora sometimes include both target task matched and unmatched sentences. In such a case, training set selection can reduce the size of the translation model. In this paper, we propose a training set selection method for translation model training using linear translation model interpolation and a language model technique. According to the experimental results, the proposed method reduces the translation model size by 50% and improves BLEU score by 1.76% in comparison with a baseline training corpus usage.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

APPLICATION OF DEA FOR SELECTING MOST EFFICIENT INFORMATION SYSTEM PROJECT WITH IMPRECISE DATA

The selection of best Information System (IS) project from many competing proposals is a critical business activity which is very helpful to all organizations. While previous IS project selection methods are useful but have restricted application because they handle only cases with precise data. Indeed, these methods are based on precise data with less emphasis on imprecise data. This paper pro...

متن کامل

Method of Selecting Training Sets to Build Compact and Efficient Statistical Language Model

For statistical language model training, target task matched corpora are required. However, training corpora sometimes include both target task matched and unmatched sentences. In such a case, training set selection is effective for both model size reduction and model performance improvement. In this paper, training set selection method for statistical language model training is described. The ...

متن کامل

PREDICTION OF SLOPE STABILITY STATE FOR CIRCULAR FAILURE: A HYBRID SUPPORT VECTOR MACHINE WITH HARMONY SEARCH ALGORITHM

The slope stability analysis is routinely performed by engineers to estimate the stability of river training works, road embankments, embankment dams, excavations and retaining walls. This paper presents a new approach to build a model for the prediction of slope stability state. The support vector machine (SVM) is a new machine learning method based on statistical learning theory, which can so...

متن کامل

Intelligent Selection of Language Model Training Data

We address the problem of selecting nondomain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach is based on comparing the cross-entropy, according to domainspecific and non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces bet...

متن کامل

The Analysis of Bayesian Probit Regression of Binary and Polychotomous Response Data

The goal of this study is to introduce a statistical method regarding the analysis of specific latent data for regression analysis of the discrete data and to build a relation between a probit regression model (related to the discrete response) and normal linear regression model (related to the latent data of continuous response). This method provides precise inferences on binary and multinomia...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Method of Selecting Training Data to Build a Compact and Efficient Translation Model

نویسندگان

چکیده

منابع مشابه

APPLICATION OF DEA FOR SELECTING MOST EFFICIENT INFORMATION SYSTEM PROJECT WITH IMPRECISE DATA

Method of Selecting Training Sets to Build Compact and Efficient Statistical Language Model

PREDICTION OF SLOPE STABILITY STATE FOR CIRCULAR FAILURE: A HYBRID SUPPORT VECTOR MACHINE WITH HARMONY SEARCH ALGORITHM

Intelligent Selection of Language Model Training Data

The Analysis of Bayesian Probit Regression of Binary and Polychotomous Response Data

عنوان ژورنال:

اشتراک گذاری